web stats
 
   
 
Selected Development Project
 
Project Title

Suffix Array-Based Index for Use in Bioinformatics

 
Principal Investigator Dr CHAN Wai Hong
 
Area of Research Project
Social Development
 
Project Period
From 01/2013 To 12/2015
Objectives
  • To further develop our proposed linear SA construction algorithms for building huge suffix arrays using external memory
  • Investigate new solutions for updating suffix arrays under different operations, i.e. insertion, deletion, and substitution
  • Investigate fast algorithms for building and using huge SA-based indices for bioinformatics applications, capable of running on clusters of general-purpose machines and/or GPUs
  • Build a library in C++ for the developed algorithms and a test-bed system for sample applications of these algorithms
Methods Used
  • Employing the suffix array algorithms we developed to further study their design variants using external memory, in order to make the algorithms applicable for general purpose computers to build suffix arrays for huge databases
  • Investigating the efficiency of reconstructing the SAs of dynamic changing strings based on the existing induced sorting algorithms
  • Designing new SA-based indices for huge data sets, and investigating some concrete algorithms for these indices regarding specific purposes of different bioinformatics applications
Summary of Findings
  • SA construction algorithm(s) using external memory, and software tools in C++ built on the algorithm(s)
  • Adaptive reconstruction algorithm(s) for SAs of dynamic changing strings
  • Algorithms for matching, clustering and sequence alignment in bioinformatics using huge SA-based indices
Impact
  • Instead of being composed by costly super-computers, SAs of huge databases can be constructed by general purpose computers using external memory
  • Updating a SA will become a theoretically and practically easier task. The reconstruction of the SA of a huge database will be more time- and space-efficient while the database is being modified
  • New scalable SA-based indices and algorithms for different bioinformatics applications will be developed in order that more biologists would be able to research on much larger databases in bioinformatics with low-cost settings

Output
  • Constructing suffix arrays in external memory using d-critical substrings
  • Induced sorting suffixes in external memory
  • On the efficiency of reconstruction algorithm for SAs of dynamic changing strings
  • Algorithms for specific applications in bioinformatics using huge SA-based indices
  • An object-oriented routine library in C++, and a software test-bed system
Biography of Principal Investigator

Education

2003         Ph.D. in Mathematics, Hong Kong Baptist University

1999         Pg.Dip. in Education, The Chinese University of Hong Kong

1996         M.Phil. in Mathematics, Hong Kong Baptist University

1994         B.Sc. in Mathematical Science, Hong Kong Baptist University

Work Experience

2014-present       Associate Professor, Department of Mathematics and Information Technology, HKIEd

2011-2014           Assistant Professor, Department of Mathematics and Information Technology, HKIEd

2010-2011           Senior Academic Advisor, Beacon Group

2006-2010           Senior Lecturer, Department of Mathematics, HKBU

2006                    College Senior Lecturer, HKU SPACE Community College

2002-2006           College Lecturer, HKU SPACE Community College

2001-2002           Instructor, College of International Education, HKBU

1996-1999           Graduate Master, Sha Tin Methodist College

Professional Experience

2000-2002           Course Developer and Academic Coordinator, Mathematics in Practice, Project Yi Jin, Federation for Continuing Education in Tertiary Institutions

Funding Source

General Research Fund